graph TD;
A[Combine AQI Data] --> B[combined_aqi_data.csv];
B --> C[Combine with HPI_master.csv];
C --> D[Combine with USIndicatorsEdited.csv];
D --> E[Fetch GSPC Closing Prices];
D --> F[Fetch Coordinates using Google Maps API];
B --> F;
C --> F;
E --> G[Final Dataset];
F --> G;
Assignment 3 - Data Analysis
Data Analysis Project
Hey! In this project, we’re going to be exploring the Housing Price Index (HPI) for United States, in the context of their Economic Indicators, S&P 500 Closing Values, Air Quality Index, and Geographical Locations. The data comes from several datasets, such as the Federal Housing Finance Agency, United State Environmental Protection Agency, International Monetary Fund, Yahoo Finance, and Google Maps API. This builds up on work done previously as part of Assignment 1, where HPI was analysed in the context of the US Economy.
This is the dataflow diagram of how the final dataset on which we’re operating was formed. This process was done through several distinct processes on R, scripts for which are available on the associated GitHub Repo. This final dataset was then saved, and all operations thereafter was directly performed on this dataset.
This diagram outlines our step-by-step approach to compiling a robust dataset for analysis. We start by bringing together air quality index data, then combine it with housing price indices and economic indicators for the US. Next, we incorporate data on GSPC closing prices and geographical coordinates. This comprehensive dataset enables us to explore correlations between air quality, housing prices, economic factors, and geographic locations, providing valuable insights for our analysis.
Libraries Used
#|echo: false
#|warnings: false
#|output: false
library(htmlwidgets)
library(corrplot)corrplot 0.92 loaded
library(tidyr)
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(MASS)
Attaching package: 'MASS'
The following object is masked from 'package:dplyr':
select
library(car)Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
library(ggplot2)
library(plotly)Warning: package 'plotly' was built under R version 4.3.3
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:MASS':
select
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(leaflet)Warning: package 'leaflet' was built under R version 4.3.3
library(knitr)
library(kableExtra)Warning: package 'kableExtra' was built under R version 4.3.3
Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':
group_rows
Data Read Operation
# Read files into a dataframe
final_data <- read.csv("C:/Users/ninja/OneDrive/Documents/GitHub/analytics-assignment3/final_data1.csv")
final_data_clean <- read.csv("C:/Users/ninja/OneDrive/Documents/GitHub/analytics-assignment3/final_data_clean_VF.csv")
final_data_averaged <- read.csv("C:/Users/ninja/OneDrive/Documents/GitHub/analytics-assignment3/final_data_averaged.csv")Quantitative Analyses
Correlation Analysis
# Calculate correlation matrix
cor_matrix <- cor(final_data_clean[, sapply(final_data_clean, is.numeric)])
# Plot correlation matrix as a heatmap
corrplot(cor_matrix, method = "color", type = "upper", order = "hclust", tl.col = "black", tl.srt = 45, tl.cex = 0.35)# Find variables with strong correlations
strong_correlations <- colnames(cor_matrix)[rowSums(abs(cor_matrix) > 0.7) > 1]
# Print the variables with strong correlations
print(strong_correlations) [1] "Year"
[2] "Moderate.Days"
[3] "Unhealthy.for.Sensitive.Groups.Days"
[4] "Unhealthy.Days"
[5] "X90th.Percentile.AQI"
[6] "Median.AQI"
[7] "index_nsa"
[8] "GSPC.Close"
[9] "Gross.domestic.product..constant.prices"
[10] "Gross.domestic.product.per.capita..constant.prices"
[11] "Gross.domestic.product.per.capita..current.prices"
[12] "Gross.domestic.product.based.on.purchasing.power.parity..PPP..share.of.world.total"
[13] "Volume.of.imports.of.goods.and.services"
[14] "Volume.of.exports.of.goods.and.services"
[15] "group"
When multiple variables in a dataset are strongly correlated, it creates a tricky situation for statistical analysis called multicollinearity. This essentially means that it’s hard to tell how each individual variable contributes to the outcome we’re interested in, whether it’s air quality or economic factors like GDP. With such tangled relationships, the results of our analysis become less reliable. It’s like trying to separate threads that are all tightly woven together. This can lead to inflated errors and confusing interpretations of our findings. To tackle multicollinearity, we need to carefully choose which variables to include in our analysis, transform the data if needed, or use specialized techniques to untangle the relationships between variables. Doing so ensures that our models are accurate and trustworthy, allowing us to draw meaningful conclusions from our data.
Visualization of Trends over time
Before we proceed with modelling, let’s observe the trends of the variables we’re mapping in order to better understand it. This also serves as a way for us to understand the distribution of the data in each of these variables
#visualizations
# Create an empty list to store plots
plot_list <- list()
# Aggregate the data by year
aggregated_data <- final_data_clean %>%
group_by(Year) %>%
summarize(across(where(is.numeric), ~ mean(., na.rm = TRUE)))
# Convert data to long format
aggregated_data_long <- aggregated_data %>%
pivot_longer(cols = -Year, names_to = "Variable", values_to = "Value")
aggregated_data_long <- aggregated_data_long %>%
mutate(Year = as.numeric(as.character(Year)))
# Loop over each numeric variable and generate a separate interactive plot
for (col in unique(aggregated_data_long$Variable)) {
plot_data <- aggregated_data_long %>%
filter(Variable == col) # Select data for the current numeric variable
# Create the plot
plot <- plot_ly(plot_data, x = ~Year, y = ~Value, type = 'scatter', mode = 'lines') %>%
layout(title = paste("Trend over Time for", col), xaxis = list(title = "Year"), yaxis = list(title = col))
# Append the plot to the list
plot_list[[col]] <- plot
}plot_list[[2]]plot_list[[3]]plot_list[[4]]plot_list[[5]]plot_list[[6]]plot_list[[7]]plot_list[[11]]plot_list[[12]]plot_list[[13]]plot_list[[14]]plot_list[[15]]Multiple Regression with Step Evaluation and Multicollinearity Tests
Using stepwise regression with the stepAIC (Akaike Information Criterion) method allows for automated variable selection by iteratively adding or removing predictors to optimize model fit while penalizing for model complexity, enhancing interpretability.
# Remove CBSA, Year, and group columns
data_subset <- final_data_clean %>%
select(-CBSA, -Year, -group)
# Perform stepwise regression
stepwise_model <- stepAIC(lm(index_nsa ~ ., data = data_subset), direction = "both")Multicollinearity
We’re now going to removing multicollinear variables from the model trained by stepAIC. We’ve identified a list of variables, and through trial and error, we’ve been able to reduce the effect of multicollinearity in the model.
# Step 2: Check for multicollinearity
vif_values <- vif(stepwise_model)
print(vif_values) Days.with.AQI
3.326653
Moderate.Days
5.754717
Unhealthy.for.Sensitive.Groups.Days
5.529663
Unhealthy.Days
4.668829
Max.AQI
1.564234
X90th.Percentile.AQI
7.933300
Median.AQI
7.522855
Days.CO
1.602317
Days.NO2
1.573584
Days.Ozone
2.352098
GSPC.Close
22.513277
Gross.domestic.product..constant.prices
9.546861
Gross.domestic.product.per.capita..constant.prices
142.623370
Gross.domestic.product.per.capita..current.prices
175.013429
Gross.domestic.product.based.on.purchasing.power.parity..PPP..share.of.world.total
38.509708
Inflation..average.consumer.prices
4.260568
Volume.of.imports.of.goods.and.services
11.107569
Volume.of.exports.of.goods.and.services
5.481977
Unemployment.rate
4.351870
Current.account.balance
7.891527
lon_numeric
1.103001
lat_numeric
1.104620
# Set a threshold for VIF values
threshold <- 10
# Identify variables with VIF above the threshold
high_collinearity_vars <- names(vif_values)[vif_values > threshold]
print(high_collinearity_vars)[1] "GSPC.Close"
[2] "Gross.domestic.product.per.capita..constant.prices"
[3] "Gross.domestic.product.per.capita..current.prices"
[4] "Gross.domestic.product.based.on.purchasing.power.parity..PPP..share.of.world.total"
[5] "Volume.of.imports.of.goods.and.services"
# Remove variables with high collinearity from the model
final_model <- update(stepwise_model, . ~ . -
Gross.domestic.product.per.capita..constant.prices - GSPC.Close -
Gross.domestic.product.based.on.purchasing.power.parity..PPP..share.of.world.total )
# Check for multicollinearity again
vif_values <- vif(final_model)
print(vif_values) Days.with.AQI
3.134972
Moderate.Days
5.749523
Unhealthy.for.Sensitive.Groups.Days
5.515527
Unhealthy.Days
4.665915
Max.AQI
1.556075
X90th.Percentile.AQI
7.775680
Median.AQI
7.477114
Days.CO
1.534515
Days.NO2
1.509965
Days.Ozone
2.327407
Gross.domestic.product..constant.prices
9.097613
Gross.domestic.product.per.capita..current.prices
2.287780
Inflation..average.consumer.prices
1.764991
Volume.of.imports.of.goods.and.services
9.859812
Volume.of.exports.of.goods.and.services
3.046563
Unemployment.rate
1.892476
Current.account.balance
1.288833
lon_numeric
1.102851
lat_numeric
1.100786
# Set a threshold for VIF values
threshold <- 10
# Identify variables with VIF above the threshold
high_collinearity_vars <- names(vif_values)[vif_values > threshold]
print(high_collinearity_vars)character(0)
# Summary of the updated final model
summary(final_model)
Call:
lm(formula = index_nsa ~ Days.with.AQI + Moderate.Days + Unhealthy.for.Sensitive.Groups.Days +
Unhealthy.Days + Max.AQI + X90th.Percentile.AQI + Median.AQI +
Days.CO + Days.NO2 + Days.Ozone + Gross.domestic.product..constant.prices +
Gross.domestic.product.per.capita..current.prices + Inflation..average.consumer.prices +
Volume.of.imports.of.goods.and.services + Volume.of.exports.of.goods.and.services +
Unemployment.rate + Current.account.balance + lon_numeric +
lat_numeric, data = data_subset)
Residuals:
Min 1Q Median 3Q Max
-147.53 -23.08 -2.71 14.87 457.04
Coefficients:
Estimate Std. Error t value
(Intercept) -5.839e+01 2.738e+00 -21.323
Days.with.AQI -8.665e-02 3.860e-03 -22.449
Moderate.Days 1.378e-01 7.061e-03 19.520
Unhealthy.for.Sensitive.Groups.Days 3.055e-02 2.344e-02 1.303
Unhealthy.Days 1.769e-01 3.865e-02 4.577
Max.AQI 2.636e-02 2.877e-03 9.162
X90th.Percentile.AQI 6.001e-02 1.725e-02 3.478
Median.AQI -6.531e-01 3.804e-02 -17.168
Days.CO 7.598e-02 5.859e-03 12.968
Days.NO2 1.458e-01 5.138e-03 28.384
Days.Ozone 8.508e-02 2.855e-03 29.799
Gross.domestic.product..constant.prices -2.834e-01 3.041e-01 -0.932
Gross.domestic.product.per.capita..current.prices 3.979e-03 1.912e-05 208.074
Inflation..average.consumer.prices 6.963e+00 1.698e-01 41.003
Volume.of.imports.of.goods.and.services -5.120e-01 9.243e-02 -5.539
Volume.of.exports.of.goods.and.services -2.897e-01 5.727e-02 -5.058
Unemployment.rate -2.610e-01 1.526e-01 -1.710
Current.account.balance -5.656e+00 1.473e-01 -38.411
lon_numeric -5.119e-01 1.185e-02 -43.203
lat_numeric -3.849e-01 3.481e-02 -11.057
Pr(>|t|)
(Intercept) < 2e-16 ***
Days.with.AQI < 2e-16 ***
Moderate.Days < 2e-16 ***
Unhealthy.for.Sensitive.Groups.Days 0.192564
Unhealthy.Days 4.73e-06 ***
Max.AQI < 2e-16 ***
X90th.Percentile.AQI 0.000505 ***
Median.AQI < 2e-16 ***
Days.CO < 2e-16 ***
Days.NO2 < 2e-16 ***
Days.Ozone < 2e-16 ***
Gross.domestic.product..constant.prices 0.351425
Gross.domestic.product.per.capita..current.prices < 2e-16 ***
Inflation..average.consumer.prices < 2e-16 ***
Volume.of.imports.of.goods.and.services 3.05e-08 ***
Volume.of.exports.of.goods.and.services 4.25e-07 ***
Unemployment.rate 0.087269 .
Current.account.balance < 2e-16 ***
lon_numeric < 2e-16 ***
lat_numeric < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 40.24 on 48501 degrees of freedom
Multiple R-squared: 0.6903, Adjusted R-squared: 0.6902
F-statistic: 5690 on 19 and 48501 DF, p-value: < 2.2e-16
This multiple regression model shows a strong overall fit with an Adjusted R-squared value of 0.6902, indicating that approximately 69.02% of the variance in the dependent variable (index_nsa) is explained by the independent variables included in the model.
Several predictors exhibit significant relationships with the dependent variable, including Days.with.AQI, Moderate.Days, Max.AQI, Median.AQI, Days.CO, Days.NO2, Days.Ozone, Inflation, Volume.of.imports.of.goods.and.services, Volume.of.exports.of.goods.and.services, Unemployment.rate, Current.account.balance, lon_numeric, and lat_numeric, as indicated by their low p-values (p < 0.05).
However, some predictors such as Unhealthy.for.Sensitive.Groups.Days and Gross.domestic.product..constant.prices do not show statistically significant associations with the dependent variable, suggesting their limited contribution to the model’s predictive power.
The relationships we’ve uncovered between various factors and our outcomes, like the Housing Price Index (HPI) or air quality index (AQI), provide us with some fascinating insights. For instance, when we look at metrics like the number of days with poor air quality or the severity of pollution, we get a clear picture of how it impacts housing prices and people’s overall well-being. On the economic side, indicators like inflation rates, trade volumes, and unemployment rates give us clues about how economic conditions shape housing markets.
What’s particularly interesting is that some factors, like days when the air quality is unhealthy for sensitive groups or GDP at constant prices, don’t seem to have a big impact on our outcomes. This tells us that while they’re important, there are other factors, like pollutant concentrations and broader economic indicators, that play a bigger role in influencing housing prices and air quality. These insights are crucial for policymakers and city planners as they work to address housing affordability and environmental concerns in our communities.
The residual standard error of 40.24 indicates the average deviation of observed values from the fitted values, providing a measure of the model’s goodness of fit.
Qualitative Analyses
Geospatial Data Visualization to Discover Patterns and Relationships
Geospatial data visualization is a powerful technique used to uncover patterns, trends, and relationships within spatially referenced data. By mapping data onto geographical coordinates, we are able to visualize complex spatial distributions, identify clusters, and understand spatial interactions between variables.
#index_nsa
# Define a function to create Leaflet map with custom popup and heatmap
create_leaflet_map <- function(data) {
pal <- colorQuantile(c("green", "red"), domain = data$index_nsa, n = 5) # Define color palette
leaflet(data) %>%
addTiles() %>%
addCircleMarkers(
lng = ~lon_numeric,
lat = ~lat_numeric,
radius = 7,
color = ~pal(index_nsa), # Use color palette based on index_nsa values
fillOpacity = 0.7,
popup = paste(
"<b>CBSA:</b> ", data$CBSA, "<br>",
"<b>Year:</b> ", data$Year, "<br>",
"<b>Good Days:</b> ", data$Good.Days, "<br>",
"<b>Max AQI:</b> ", data$Max.AQI, "<br>",
"<b>Index NSA:</b> ", data$index_nsa
),
label = ~substr(data$CBSA, 1, 20), # Shorten CBSA name for label
labelOptions = labelOptions(noHide = FALSE)
) %>%
addLegend(
position = "bottomright",
pal = pal,
values = ~index_nsa,
title = "Index NSA"
)
}
# Call the function for each group
maps <- list()
for (i in unique(final_data_averaged$group)) {
data <- filter(final_data_averaged, group == i)
map <- create_leaflet_map(data)
title <- htmltools::h2(paste("Housing Price Index (Heatmap) across a Geographic Region - Group ", i),
style = "font-family: Arial; font-size: 18px; color: Black; text-align: center;")
maps[[i]] <- prependContent(map, title)
}maps[[1]]Housing Price Index (Heatmap) across a Geographic Region - Group 1
maps[[2]]Housing Price Index (Heatmap) across a Geographic Region - Group 2
maps[[3]]Housing Price Index (Heatmap) across a Geographic Region - Group 3
maps[[4]]Housing Price Index (Heatmap) across a Geographic Region - Group 4
maps[[5]]Housing Price Index (Heatmap) across a Geographic Region - Group 5
maps[[6]]Housing Price Index (Heatmap) across a Geographic Region - Group 6
maps[[7]]Housing Price Index (Heatmap) across a Geographic Region - Group 7
In our exploration of housing trends, we’ve uncovered a concerning pattern in the western part of the USA: steadily increasing housing costs. What’s driving this phenomenon? Well, it’s a mix of factors. Take a look at cities like San Francisco, Los Angeles, Seattle, and Denver. They’re buzzing with economic opportunities, drawing people in with promises of thriving job markets and enviable lifestyles.
But with all this growth comes a downside: a surge in demand for housing that surpasses what’s available. This mismatch between supply and demand has pushed housing prices through the roof, making affordable options scarce. And it’s not just about people wanting to move in – regulations on zoning and land use, coupled with limited space for development, have tightened the squeeze even further.
Adding to the mix are speculative real estate ventures and foreign investments pouring into urban areas, driving prices up even more. It’s a tough situation for folks, especially those with modest incomes, who find themselves grappling with the challenge of finding housing that fits their budget.
#aqi
# Define a function to create Leaflet map with custom popup and heatmap
create_leaflet_map2 <- function(data) {
pal <- colorQuantile(c("green", "red"), domain = data$Max.AQI, n = 5) # Define color palette
leaflet(data) %>%
addTiles() %>%
addCircleMarkers(
lng = ~lon_numeric,
lat = ~lat_numeric,
radius = 7,
color = ~pal(index_nsa), # Use color palette based on index_nsa values
fillOpacity = 0.7,
popup = paste(
"<b>CBSA:</b> ", data$CBSA, "<br>",
"<b>Year:</b> ", data$Year, "<br>",
"<b>Good Days:</b> ", data$Good.Days, "<br>",
"<b>Max AQI:</b> ", data$Max.AQI, "<br>",
"<b>Index NSA:</b> ", data$index_nsa
),
label = ~substr(data$CBSA, 1, 20), # Shorten CBSA name for label
labelOptions = labelOptions(noHide = FALSE)
) %>%
addLegend(
position = "bottomright",
pal = pal,
values = ~Max.AQI,
title = "Average Max AQI"
)
}
# Call the function for each group
maps2 <- list()
for (i in unique(final_data_averaged$group)) {
data <- filter(final_data_averaged, group == i)
map <- create_leaflet_map2(data)
title <- htmltools::h2(paste("Air Quality Index (Heatmap) across a Geographic region - Group ", i),
style = "font-family: Arial; font-size: 18px; color: black; text-align: center;")
maps2[[i]] <- htmlwidgets::prependContent(map, title)
}maps2[[1]]Air Quality Index (Heatmap) across a Geographic region - Group 1
maps2[[2]]Air Quality Index (Heatmap) across a Geographic region - Group 2
maps2[[3]]Air Quality Index (Heatmap) across a Geographic region - Group 3
maps2[[4]]Air Quality Index (Heatmap) across a Geographic region - Group 4
maps2[[5]]Air Quality Index (Heatmap) across a Geographic region - Group 5
maps2[[6]]Air Quality Index (Heatmap) across a Geographic region - Group 6
maps2[[7]]Air Quality Index (Heatmap) across a Geographic region - Group 7
From 1990 to 2023, the air quality index (AQI) in the USA has shown a concerning trend. Initially, there were positive outcomes as environmental regulations and technological advancements led to improvements in AQI. However, challenges emerged with the continued growth of urbanization, industrialization, and population density, particularly along borders and in densely populated areas.
Increased vehicular traffic, industrial emissions, and energy consumption have contributed to higher levels of air pollution, gradually deteriorating the AQI. Geographical factors such as proximity to major transportation routes and industrial zones have exacerbated air quality issues.
As pollution levels intensified, especially in heavily populated areas, addressing air quality concerns became increasingly urgent. Comprehensive regulatory measures, technological innovations, and public awareness campaigns are needed to mitigate the impact of air pollution on public health and the environment.
# Define the list of variables to include in the subset
variables_to_include <- c(
"GSPC.Close",
"Gross.domestic.product..constant.prices",
"Gross.domestic.product.per.capita..constant.prices",
"Gross.domestic.product.per.capita..current.prices",
"Gross.domestic.product.based.on.purchasing.power.parity..PPP..share.of.world.total",
"Inflation..average.consumer.prices",
"Volume.of.imports.of.goods.and.services",
"Volume.of.exports.of.goods.and.services",
"Unemployment.rate"
)
# Create the subset
subset_data <- final_data_averaged %>%
select(Year,group, all_of(variables_to_include))
# Filter subset_data for the specified years
filtered_data <- subset_data %>%
filter(Year %in% c(1991, 1996, 2001, 2006, 2011, 2016, 2021, 2023)) %>%
distinct(Year, .keep_all = TRUE)
# Create a vector of years to highlight
highlight_years <- c(1991, 1996, 2001, 2006, 2011, 2016, 2021, 2023)
# Highlight the rows corresponding to the specified years
filtered_data$Year_highlight <- ifelse(filtered_data$Year %in% highlight_years, "background-color: #FFFF00", "")
# Display the formatted table
filtered_data %>%
arrange(Year) %>%
select(-Year_highlight) %>%
kable(caption = "Filtered Data for Selected Years", align = "c") %>%
kable_styling(full_width = FALSE) %>%
row_spec(row = which(filtered_data$Year %in% highlight_years), background = "#FFFF00")| Year | group | GSPC.Close | Gross.domestic.product..constant.prices | Gross.domestic.product.per.capita..constant.prices | Gross.domestic.product.per.capita..current.prices | Gross.domestic.product.based.on.purchasing.power.parity..PPP..share.of.world.total | Inflation..average.consumer.prices | Volume.of.imports.of.goods.and.services | Volume.of.exports.of.goods.and.services | Unemployment.rate |
|---|---|---|---|---|---|---|---|---|---|---|
| 1996 | 1 | 493.5946 | 2.775167 | 38861.29 | 27058.73 | 19.94267 | 3.094333 | 7.355167 | 7.349167 | 6.391667 |
| 2001 | 2 | 1186.8270 | 3.750800 | 44831.24 | 34436.85 | 20.11600 | 2.452400 | 9.457800 | 4.391600 | 4.473600 |
| 2006 | 3 | 1126.2335 | 2.922000 | 49032.39 | 41848.77 | 19.22820 | 2.630000 | 6.537400 | 5.237000 | 5.401600 |
| 2011 | 4 | 1205.7735 | 0.758400 | 50824.69 | 48422.15 | 17.10500 | 2.228600 | 1.159000 | 5.259400 | 7.648200 |
| 2016 | 5 | 1843.3650 | 2.157200 | 53219.21 | 54927.09 | 16.07100 | 1.308400 | 3.091800 | 2.309000 | 6.348200 |
| 2021 | 6 | 3144.5455 | 2.132200 | 57060.47 | 64296.18 | 15.79400 | 2.463400 | 3.007000 | 0.071800 | 5.078400 |
| 2023 | 7 | 4084.0928 | 2.071857 | 60417.07 | 78087.18 | 15.48829 | 6.313714 | 3.405571 | 4.720000 | 3.610714 |
This table provides insights into various aspects of US society, offering a qualitative analysis of trends observed over the years.
Regarding stock market performance, the data shows a consistent upward trend in the S&P 500 index from 1996 to 2023, indicating overall growth and stability in the stock market.
Economic growth is reflected in the increasing trend of Gross Domestic Product (GDP) measures. Both GDP per capita at constant and current prices show steady growth over time, suggesting improvements in living standards and economic prosperity.
Inflation rates fluctuate over the years but generally remain within acceptable ranges, indicating stable economic conditions despite variations in consumer price levels.
Trade volume, represented by the volume of imports and exports, demonstrates fluctuations influenced by global economic conditions, trade policies, and exchange rates.
Lastly, the unemployment rate shows variations over time, influenced by economic growth, business cycles, and government policies, highlighting the dynamic nature of labor market conditions.
Overall, these metrics provide valuable insights into the economic landscape of the United States, offering a nuanced understanding of trends and patterns shaping various aspects of society.
In summary, this project comprehensively analyzed the United States’ economic indicators, housing prices, and air quality trends. By employing advanced statistical techniques and geospatial visualization, it uncovered significant correlations, identified key trends, and provided valuable insights into the complex dynamics shaping societal and environmental factors. These findings have the potential to contribute to a deeper understanding of regional disparities and inform evidence-based strategies for addressing housing affordability in the United States.